7 research outputs found

    Automatically evolving rule induction algorithms with grammar-based genetic programming

    Get PDF
    In the last 30 years, research in the field of rule induction algorithms produced a large number of algorithms. However, these algorithms are usually obtained from the combination of a basic rule induction algorithm (typically following the sequential covering approach) with new evaluation functions, pruning methods and stopping criteria for refining or producing rules, generating many "new" and more sophisticated sequential covering algorithms. We cannot deny that these attempts to improve the basic sequential covering approach have succeeded. Hence, if manually changing these major components of rule induction algorithms can result in new, significantly better ones, why not to automate this process to make it more cost-effective? This is the core idea of this work: to automate the process of designing rule induction algorithms by means of grammar-based genetic programming. Grammar-based Genetic Programming (GGP) is a special type of evolutionary algorithm used to automatically evolve computer programs. The most interesting feature of this type of algorithm is that it incorporates a grammar into its search mechanism, which expresses prior knowledge about the problem being solved. Since we have a lot of previous knowledge about how humans design rule induction algorithms, this type of algorithm is intuitively a suitable tool to automatically evolve rule induction algorithms. The grammar given to the proposed GGP system includes knowledge about how humans- design rule induction algorithms, and also presents some new elements which could work in rule induction algorithms, but to the best of our knowledge were not previously tested. The GG P system aims to evolve rule induction algorithms under two different frameworks, as follows. In the first framework, the GGP is used to evolve robust rule induction algorithms, i.e., algorithms which were designed to be applied to virtually any classification data set, like a manually-designed rule induction algorithm. In the second framework, the GGP is applied to evolve rule induction algorithms tailored to a specific application XVI domain, i.e., rule induction algorithms tailored to a single data set. Note that the latter framework is hardly feasible on a hard scale in the case of conventional, manually-designed algorithms, since the number of classification data sets greatly outnumbers the number of rule induction algorithms designers. However, it is clearly feasible on a large scale when using the proposed system, which automates the process of rule induction algorithm design and implementation. Overall, extensive computational experiments with 20 VCI data sets and 5 bioinformatics data sets showed that effective rule induction algorithms can be automatically generated using the GGP in both frameworks. Moreover, the automatically evolved rule induction algorithms were shown to be competitive with (and overall slightly better than) four well-known manually designed rule induction algorithms when comparing their predictive accuracies. The proposed GGP system was also compared to a grammar-based hillclimbing system, and experimental results showed that the GGP system is a more effective method to evolve rule induction algorithms than the grammar-based hillclimbing method. At last, a multi-objective version of the GGP (based on the concept of Pareto dominance) was also proposed, and experiments were performed to evolve robust rule induction algorithms which generate both accurate and simple models. The results showed that in most of the cases the GGP system can produce rule induction algorithms which are competitive in predictive accuracy to wellknown human-designed rule induction algorithms, but generate simpler classification modes - i.e., smaller rule sets, intuitively easier to be interpreted by the user

    Analysing Symbolic Regression Benchmarks under a Meta-Learning Approach

    Full text link
    The definition of a concise and effective testbed for Genetic Programming (GP) is a recurrent matter in the research community. This paper takes a new step in this direction, proposing a different approach to measure the quality of the symbolic regression benchmarks quantitatively. The proposed approach is based on meta-learning and uses a set of dataset meta-features---such as the number of examples or output skewness---to describe the datasets. Our idea is to correlate these meta-features with the errors obtained by a GP method. These meta-features define a space of benchmarks that should, ideally, have datasets (points) covering different regions of the space. An initial analysis of 63 datasets showed that current benchmarks are concentrated in a small region of this benchmark space. We also found out that number of instances and output skewness are the most relevant meta-features to GP output error. Both conclusions can help define which datasets should compose an effective testbed for symbolic regression methods.Comment: 8 pages, 3 Figures, Proceedings of Genetic and Evolutionary Computation Conference Companion, Kyoto, Japa

    Automatically evolving rule induction algorithms with grammar-based genetic programming

    No full text
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Towards Automated Lymphoma Prognosis based on PET Images

    No full text
    electronic version (8 pp.) To appearInternational audienc

    Exploring multiple evidence to inferusers? location in twitter.

    No full text
    Online social networks are valuable sources of information to monitor real-time events, such as earthquakes and epidemics. For this type of surveillance, users? location is an essential piece of information, but a substantial number of users choose not to disclose their geographical location. However, characteristics of the users' behavior, such as the friends they associate with and the types of messages published may hint on their spatial location. In this paper, we propose a method to infer the spatial location of Twitter users. Unlike the approaches proposed so far, it incorporates two sources of information to learn geographical position: the text posted by users and their friendship network. We propose a probabilistic approach that jointly models the geographical labels and Twitter texts of users organized in the form of a graph representing the friendship network. We use the Markov random ?eld probability model to represent the network, and learning is carried out through a Markov Chain Monte Carlo simulation technique to approximate the posterior probability distribution of the missing geographical labels. We show the accuracy of the algorithm in a large dataset of Twitter users, where the ground truth is the location given by GPS. The method presents promising results, with little sensitivity to parameters and high values of precision

    A methodology for photometric validation in vehicles visual interactive systems.

    No full text
    This work proposes a methodology for automatically validating the internal lighting system of an automobile by assessing the visual quality of each instrument in an instrument cluster (IC) (i.e., vehicle gauges, such as speedometer, tachometer, temperature and fuel gauges) based on the user’s perceptions. Although the visual quality assessment of an instrument is a subjective matter, it is also influenced by some of its photometric features, such as the light intensity distribution. This work presents a methodology for identifying and quantifying non-homogeneous regions in the lighting distribution of these instruments, starting from a digital image. In order to accomplish this task, a set of 107 digital images of instruments were acquired and preprocessed, identifying a set of instrument regions. These instruments were also evaluated by common drivers and specialists to identify their non-homogenous regions. Then, for each region, we extracted a set of homogeneity descriptors, and also proposed a relational descriptor to study the homogeneity influence of a region in the whole instrument. These descriptors were associated with the results of the manual labeling, and given to two machine learning algorithms, which were trained to identify a region as being homogeneous or not. Experiments showed that the proposed methodology obtained an overall precision above 94% for both regions and instrument classifications. Finally, a meticulous analysis of the users’ and specialist’s image evaluations is performe
    corecore